Quantitative Big Imaging

Many Objects and Distributions

ETHZ: 227-0966-00L

## Loading required package: knitr

Course Outline

Literature / Useful References

Books

Literature (Continued)

Papers / Sites

Previously on QBI …

Outline

Motivation (Why and How?)

Global Enviroment

Metrics

We examine a number of different metrics in this lecture and additionally to classifying them as Local and Global we can define them as point and voxel-based operations.

Point Operations

x y z
2 3 4
1 1 3
1 0 4
0 0 4

Voxel Operation

What do we start with?

Going back to our original cell image

  1. We have been able to get rid of the noise in the image and find all the cells (lecture 2-4)
  2. We have analyzed the shape of the cells using the shape tensor (lecture 5)
  3. We even separated cells joined together using Watershed (lecture 6)

We can characterize the sample and the average and standard deviations of volume, orientation, surface area, and other metrics

Motivation (Why and How?)

With all of these images, the first step is always to understand exactly what we are trying to learn from our images.

All Cells

All Cells

  1. We want to know how many cells are alive
  1. We want to know where the cells are alive or most densely packed

Motivation (continued)

All Cells

All Cells

  1. We want to know how the cells are communicating

Motivation (continued)

All Cells

All Cells

  1. We want to know how the cells are nourished

So what do we still need

  1. A way for counting cells in a region and estimating density without creating arbitrary boxes
  2. A way for finding out how many cells are near a given cell, it’s nearest neighbors
  3. A way for quantifying how far apart cells are and then comparing different regions within a sample
  4. A way for quantifying and comparing orientations

What would be really great?

A tool which could be adapted to answering a large variety of problems - multiple types of structures - multiple phases

Destructive Measurements

With most imaging techniques and sample types, the task of measurement itself impacts the sample. - Even techniques like X-ray tomography which claim to be non-destructive still impart significant to lethal doses of X-ray radition for high resolution imaging - Electron microscopy, auto-tome-based methods, histology are all markedly more destructive and make longitudinal studies impossible - Even when such measurements are possible - Registration can be a difficult task and introduce artifacts

Why is this important?

Ok, so now what?

Smaller Region

Smaller Region

\[ \downarrow \]

x y vx vy
20.19 10.69 -0.95 -0.30
20.19 10.69 0.30 -0.95
293.08 13.18 -0.50 0.86
293.08 13.18 -0.86 -0.50
243.81 14.23 0.68 0.74
243.81 14.23 -0.74 0.68

\[ \cdots \]

So if we want to know the the mean or standard deviations of the position or orientations we can analyze them easily.

Min. 1st Qu. Median Mean 3rd Qu. Max.
x 6.90 215.70 280.50 258.20 339.00 406.50
y 10.69 111.60 221.00 208.60 312.50 395.20
Length 1.06 1.57 1.95 2.08 2.41 4.33
vx -1.00 -0.94 -0.70 -0.42 0.07 0.71
vy -1.00 -0.70 0.02 0.04 0.71 1.00
Theta -180.00 -134.10 -0.50 -4.67 130.60 177.70

Simple Statistics

When given a group of data, it is common to take a mean value since this is easy. The mean bone thickness is 0.3mm. This is particularly relevant for groups with many samples because the mean is much smaller than all of the individual points.

but means can lie

some means are not very useful

Calculating Density

One of the first metrics to examine with distribution is density \(\rightarrow\) how many objects in a given region or volume.

It is deceptively easy to calculate involving the ratio of the number of objects divided by the volume.
Grid Nearest Neighbor

Grid Nearest Neighbor

It doesn’t tell us much, many very different systems with the same density and what if we want the density of a single point? Does that even make sense?
Grid Nearest Neighbor

Grid Nearest Neighbor

Neighbors

Definition

Oxford American \(\rightarrow\) be situated next to or very near to (another) - Does not sound very scientific - How close? - Touching, closer than anything else?

Nearest Neighbor (distance)

Given a set of objects with centroids at \[ \textbf{P}=\begin{bmatrix} \vec{x}_0,\vec{x}_1,\cdots,\vec{x}_i \end{bmatrix} \]

We can define the nearest neighbor as the position of the object in our set which is closest

\[ \vec{\textrm{NN}}(\vec{y}) = \textrm{argmin}(||\vec{y}-\vec{x}|| \forall \vec{x} \in \textbf{P}-\vec{y}) \]

We define the distance as the Euclidean distance from the current point to that point, and the angle as the

\[ \textrm{NND}(\vec{y}) = \textrm{min}(||\vec{y}-\vec{x}|| \forall \vec{x} \in \textbf{P}-\vec{y}) \] \[ \textrm{NN}\theta(\vec{y}) = \tan^{-1}\frac{(\vec{\textrm{NN}}-\vec{y})\cdot \vec{j}}{(\vec{\textrm{NN}}-\vec{y})\cdot \vec{i}} \]

Nearest Neighbor Definition

So examining a simple starting system like a grid, we already start running into issues. - In a perfect grid like structure each object has 4 equidistant neighbors (6 in 3D) - Which one is closest?

We thus add an additional clause (only relevant for simulated data) where if there are multiple equidistant neighbors, a nearest is chosen randomly

This ensures when we examine the orientation distribution (NN\(\theta\)) of the neighbors it is evenly distributed

Grid Nearest Neighbor

Grid Nearest Neighbor

In-Silico Systems

For the rest of these sections we will repeatedly use several simple in-silico systems to test our methods and try to better understand the kind of results we obtain from them.

\[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \alpha \begin{bmatrix} x \\ y \end{bmatrix} \]

\[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \begin{bmatrix} 1 & \alpha \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} \]

In-Silico Systems (Continued)

\[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \begin{bmatrix} \textrm{sign}(x) \left(\frac{|x|}{m}\right)^\alpha m \\ \textrm{sign}(y) \left(\frac{|y|}{m}\right)^\alpha m \end{bmatrix} \]

\[ \theta (x,y) = \alpha \sqrt{x^2+y^2} \]

\[ \begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = \begin{bmatrix} \cos\theta(x,y) & -\sin\theta(x,y) \\ \sin\theta(x,y) & \cos\theta(x,y) \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} \]

Examining Compression

Uniaxially Stretched

Uniaxially Stretched

Grid Nearest Neighbor

Grid Nearest Neighbor

Compression Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

Examining Different Shears

Uniaxially Stretched

Uniaxially Stretched

Grid Nearest Neighbor

Grid Nearest Neighbor

Shear Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

Examining Different Stretches

Uniaxially Stretched

Uniaxially Stretched

Grid Nearest Neighbor

Grid Nearest Neighbor

Stretch Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

Examining Swirl Systems

Grid Nearest Neighbor

Grid Nearest Neighbor

Swirl NN Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

What we notice

We notice there are several fairly significant short-comings of these metrics (particularly with in-silico systems)

  1. Orientation appears to be useful but random
  1. Single outlier objects skew results
  2. We only extract one piece of information
  3. Difficult to create metrics

Luckily we are not the first people to address this issue

Random Systems

Using a uniform grid of points as a starting point has a strong influence on the results. A better approach is to use a randomly distributed series of points - resembles real data much better - avoids these symmetry problems - \(\epsilon\) sized edges or overlaps - identical distances to nearby objects

Maximum Stretch

Maximum Stretch

Examining Compression

Uniaxially Stretched

Uniaxially Stretched

Grid Nearest Neighbor

Grid Nearest Neighbor

Compression Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

Examining Different Shears

Uniaxially Stretched

Uniaxially Stretched

Grid Nearest Neighbor

Grid Nearest Neighbor

Shear Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

Examining Different Stretches

Uniaxially Stretched

Uniaxially Stretched

Grid Nearest Neighbor

Grid Nearest Neighbor

Stretch Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

Examining Swirl Systems

Grid Nearest Neighbor

Grid Nearest Neighbor

Swirl NN Distributions

Length Distribution

Length Distribution

NN Orientation

NN Orientation

Voronoi Tesselation

Voronoi tesselation is a method for partitioning a space based on points. The basic idea is that each point \(\vec{p}\) is assigned a region \(\textbf{R}\) consisting of points which are closer to \(\vec{p}\) than any of the other points. Below the diagram is shown in a dashed line for the points shown as small circles.

NN Orientation

NN Orientation

We call the area of a region (\(\textbf{R}\)) around point \(\vec{p}\) its territory.

The grid on the random system, shows much more diversity in territory area.

NN Orientation

NN Orientation

Calculating Density

Back to our original density problem of having just one number to broadly describe the system. - Can a voronoi tesselation help us with this? - YES

With density we calculated
\[ \textrm{Density} = \frac{\textrm{Number of Objects}}{\textrm{Total Volume}} \] with the regions we have a territory (volume) per object so the average territory is \[ \bar{Territory} = \frac{\sum \textrm{Territory}_i}{\textrm{Number of Objects}} = \frac{\textrm{Total Volume}}{\textrm{Number of Objects}} = \frac{1}{\textrm{Density}} \] So the same, but we now have a density definition for a single point! \[ \textrm{Density}_i = \frac{1}{\textrm{Territory}_i} \]

Density Examples

Grid Nearest Neighbor

Grid Nearest Neighbor

Grid Nearest Neighbor

Grid Nearest Neighbor

Delaunay Triangulation

A parallel or dual idea where triangles are used and each triangle is created such that the circle which encloses it contains no other points. The triangulation makes the neighbors explicit since connected points in the triangulation correspond to points in our tesselation which share an edge (or face in 3D)

NN Orientation

NN Orientation

We define the number of connections each point \(\vec{p}\) has the Neighbor Count or Delaunay Neighbor Count.

The triangulation on a random system has a much higher diversity in neighbor count

NN Orientation

NN Orientation

Compression System

Compression

Maximum Stretch

Maximum Stretch

Tension

Maximum Stretch

Maximum Stretch

Shear System

Low Shear

Maximum Stretch

Maximum Stretch

High Shear

Maximum Stretch

Maximum Stretch

Stretch System

Low Stretch

Maximum Stretch

Maximum Stretch

Highly Stretched System

Maximum Stretch

Maximum Stretch

Swirl System

Low Swirl System

Maximum Stretch

Maximum Stretch

High Swirl System

Maximum Stretch

Maximum Stretch

Neighborhoods

Compression

Maximum Stretch

Maximum Stretch

Stretch

Maximum Stretch

Maximum Stretch

Shear

Maximum Stretch

Maximum Stretch

Swirl

Maximum Stretch

Maximum Stretch

Neighbor Count

Maximum Stretch

Maximum Stretch

Volume

Maximum Stretch

Maximum Stretch

Mean vs Variability

Maximum Stretch

Maximum Stretch

Maximum Stretch

Maximum Stretch

Where are we at?

We have introduced a number of “operations” we can perform on our objects to change their positions - compression - stretching - shearing - swirling

We have introduced a number of metrics to characterize our images - Nearest Neighbor distance - Nearest Neighbor angle - Delaunay Neighbor count - Territory Area (Volume)

A single random systems is useful - but in order to have a reasonable understanding of the behavior of a system we need to sample many of them.

Understand metrics as a random system + a known transformation

Understanding Metrics

In imaging science we always end up with lots of data, the tricky part is understanding the results that come out. With this simulation-based approach - we generate completely random data - apply a known transformation to it \(\mathcal{F}\) - quantify the results

We can then take this knowledge and use it to interpret observed data as transformations on an initially random system. We try and find the rules used to produce the sample

Examples

  1. Cell distribution in bone
  1. Egg-shell Pores

Compression

Maximum Stretch

Maximum Stretch

Compression Sensitivity

Maximum Stretch

Maximum Stretch

Stretching

Maximum Stretch

Maximum Stretch

Stretching Sensitivity

Maximum Stretch

Maximum Stretch

Shearing

Maximum Stretch

Maximum Stretch

Shearing Sensitivity

Maximum Stretch

Maximum Stretch

Swirling

Maximum Stretch

Maximum Stretch

Swirling Sensitivity

Maximum Stretch

Maximum Stretch

Self-Avoiding

From the nearest neighbor distance metric, we can create a scale-free version of the metric which we call self-avoiding coefficient or grouping.

The metric is the ratio of - observed nearest neighbor distance \(\textrm{NND}\) - the expected mean nearest neighbor distance (\(r_0\)) for a random point distribution (Poisson Point Process) with the same number of points (N.Obj) per volume (Total.Volume). \[r_0=\sqrt[3]{\frac{\textrm{Total.Volume}}{2\pi \textrm{ N.Obj}}}\]

Using the territory we defined earlier (Region area/volume) we can simplify the definition to

\[ r_0=\sqrt[3]{\frac{\bar{\textrm{Ter}}}{2\pi}} \]

\[ \textrm{SAC} = \frac{\textrm{NND}}{\sqrt[3]{\frac{\bar{\textrm{Ter}}}{2\pi}}} \]

Distribution Tensor

So the information we have is 3D why are we taking single metrics (distance, angle, volume) to quantify it. - Shouldn’t we use 3D metrics with 3D data? - Just like the shape tensor we covered before, we can define a distribution tensor to characterize the shape of the distribution. - The major difference instead of constituting voxels we use edges - an edge is defined from the Delaunay triangluation - it connects two neighboring bubbles together

We start off by calculating the covariance matrix from the list of edges \(\vec{v}_{ij}\) in a given volume \(\mathcal{V}\)

\[ \vec{v}_{ij} = \vec{\textrm{COV}}(i)-\vec{\textrm{COV}}(j) \]

\[ COV(\mathcal{V}) = \frac{1}{N} \sum_{\forall \textrm{COM}(i) \in \mathcal{V}} \begin{bmatrix} \vec{v}_x\vec{v}_x & \vec{v}_x\vec{v}_y & \vec{v}_x\vec{v}_z\\ \vec{v}_y\vec{v}_x & \vec{v}_y\vec{v}_y & \vec{v}_y\vec{v}_z\\ \vec{v}_z\vec{v}_x & \vec{v}_z\vec{v}_y & \vec{v}_z\vec{v}_z \end{bmatrix} \]

Distribution Tensor (continued)

We then take the eigentransform of this array to obtain the eigenvectors (principal components, \(\vec{\Lambda}_{1\cdots 3}\)) and eigenvalues (scores, \(\lambda_{1\cdots 3}\))

\[ COV(I_{id}) \longrightarrow \underbrace{\begin{bmatrix} \vec{\Lambda}_{1x} & \vec{\Lambda}_{1y} & \vec{\Lambda}_{1z} \\ \vec{\Lambda}_{2x} & \vec{\Lambda}_{2y} & \vec{\Lambda}_{2z} \\ \vec{\Lambda}_{3x} & \vec{\Lambda}_{3y} & \vec{\Lambda}_{3z} \end{bmatrix}}_{\textrm{Eigenvectors}} * \underbrace{\begin{bmatrix} \lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix}}_{\textrm{Eigenvalues}} * \underbrace{\begin{bmatrix} \vec{\Lambda}_{1x} & \vec{\Lambda}_{1y} & \vec{\Lambda}_{1z} \\ \vec{\Lambda}_{2x} & \vec{\Lambda}_{2y} & \vec{\Lambda}_{2z} \\ \vec{\Lambda}_{3x} & \vec{\Lambda}_{3y} & \vec{\Lambda}_{3z} \end{bmatrix}^{T}}_{\textrm{Eigenvectors}} \] The principal components tell us about the orientation of the object and the scores tell us about the corresponding magnitude (or length) in that direction.

Distribution Anisotropy

Visual example - Tensor represents the average spacing between objects in each direction \(\approx\) thickness of background. - Its interpretation is more difficult since it doesn’t represent a real object

Distribution Tensor

From this tensor we can define an anisotropy in the same manner as we defined for shapes. The anisotropy defined as before \[ Aiso = \frac{\text{Longest Side}-\text{Shortest Side}}{\text{Longest Side}} \]

Distribution Oblateness

Distribution Tensor

From this tensor we can also define oblateness in the same manner as we defined for shapes. The oblateness is also defined as before as a type of anisotropy

\[ \textrm{Ob} = 2\frac{\lambda_{2}-\lambda_{1}}{\lambda_{3}-\lambda_{1}}-1 \]

Orientation

The shape tensor provides for each object 3 possible orientations (each of the eigenvectors). For simplicity we will take the primary direction (but the others can be taken as well, and particularly in oblate or pancake shaped objects the first is probably not the best choice!)

Grid Nearest Neighbor

Grid Nearest Neighbor

Orientation

Since orientation derived from a shape tensor / ellipsoid model has no heads or tails. The orientation is only down to a sign \(\longrightarrow == \longleftarrow\) and \(\uparrow == \downarrow\).

Primary Orientation

Primary Orientation

This means calculating the average and standard deviation are very poor desciptors of the actual dataset. The average for all samples below is around 90 (vertical) even though almost no samples are vertical and the first sample shows a very high (90) standard deviation even though all the samples in reality have the same orientation.

Angle.Variability Mean.Angle Sd.Angle
5.729578 103.63067 87.94489
62.864789 101.25924 116.40803
120.000000 88.66128 168.55473

The problem can be dealt with by using the covariance matrix which takes advantage of the products which makes the final answer independent of sign.

Alignment Tensor

We can again take advantage of the versatility of a tensor representation for our data and use an alignment tensor.

\[ \vec{v}_{i} = \vec{\Lambda_1}(i) \]

\[ COV = \frac{1}{N} \sum_{\forall \textrm{COM}(i) \in \mathcal{V}} \begin{bmatrix} \vec{v}_x\vec{v}_x & \vec{v}_x\vec{v}_y & \vec{v}_x\vec{v}_z\\ \vec{v}_y\vec{v}_x & \vec{v}_y\vec{v}_y & \vec{v}_y\vec{v}_z\\ \vec{v}_z\vec{v}_x & \vec{v}_z\vec{v}_y & \vec{v}_z\vec{v}_z \end{bmatrix} \]

Alignemnt Tensor

Alignment Tensor: Example

Using the example from before

Grid Nearest Neighbor

Grid Nearest Neighbor

Show some tensor stuff here

Alignment Anisotropy

Anisotropy for alignment can be summarized as degree of alignment since very anisotropic distributions mean the objects are aligned well in the same direction while an isotropic distribution means the orientations are random. Oblateness can also be defined but is normally not particularly useful.

\[ Aiso = \frac{\text{Longest Side}-\text{Shortest Side}}{\text{Longest Side}} \]

Alignemnt Tensor

Alignment Anisotropy Applied

Alignment Anisotropy

Alignment Anisotropy

Alignment Histograms

Alignment Histograms

Angle.Variability Mean.Angle Sd.Angle Alignment
5.729578 103.63067 87.94489 99.03653
62.864789 101.25924 116.40803 20.44900
120.000000 88.66128 168.55473 14.73610

Alignment for many samples

Alignment Arrows

Alignment Arrows

Angle Accuracy

Alignment Correlation

Alignment Correlation

Variability Accuracy

Alignment Correlation

Alignment Correlation

Other Approaches

K-Means

K-Means can also be used to classify the point-space itself after shape analysis. It is even better suited than for images because while most images are only 2 or 3D the shape vector space can be 50-60 dimensional and is inherently much more difficult to visualize.

2 Point Correlation Functions

For a wider class of analysis of spatial distribution, there exist a class a functions called N-point Correlation Functions - given a point of type \(A\) at \(\vec{x}\) - what is the probability of \(B\) at \(\vec{x}+\vec{r}\)